-
Notifications
You must be signed in to change notification settings - Fork 31
Default and max allowed returned values for collections and items #482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
124170f
to
48e3b2e
Compare
2074b15
to
d6137b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, just a few things.
README.md
Outdated
| `STAC_GLOBAL_ITEM_MAX_LIMIT` | Configures the maximum number of STAC items that can be returned in a single search request. | `100` | Optional | | ||
| `STAC_DEFAULT_ITEM_LIMIT` | Configures the default number of STAC items returned when no limit parameter is specified in the request. | `10` | Optional | | ||
| `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional | | ||
| `ENV_MAX_LIMIT` | Configures the environment variable in SFEOS to override the default `MAX_LIMIT`, which controls the limit parameter for returned items and STAC collections. | `10,000` | Optional | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YuriZmytrakov Do we want to keep ENV_MAX_LIMIT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like ENV_MAX_LIMIT
is used by max_result_window
, I will need a follow up task to remove the old limit.
Line 819 in 9feecfe
max_result_window = get_max_limit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it interferes with the logic in this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was your pr: #434
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s correct. Since the ENV_MAX_LIMIT
env var is no longer used for controlling amount of returned items/collections, I’ll need to revert my changes from the previous PR. I have also removed it from the README
and updated the changelog
. Apologies for confusion on this topic.
limit = default_limit | ||
|
||
if global_max_limit > 0: | ||
limit = min(limit, global_max_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we do this - limit = min(limit, global_max_limit)
- are we ignoring the fact that someone may want a higher global limit? We should write down what the expectations are and then add a test I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test these cases - let me know what you think:
Test with no global max limit set (should allow any limit).
Test with global max limit set (should cap at that limit).
Test with no limit specified (should use default).
Test with limit higher than global max (should be capped).
Test with limit lower than global max (should be respected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was agreed that we should set a global limit for /collections
and /items
, as allowing too many results slows down the API. The value can always be overridden by an env var, also I am not sure if we should allow unlimited returned items ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonhealy1 I have added the tests, removed not relevant test:
Test with no limit specified for items/collections (should use default). -test_default_item_limit_without_limit_parameter_set(), test_default_collection_limit()
Test with limit higher than global max items/collections (should be capped). - test_global_collection_max_limit_set(), test_default_collection_limit()
Test with limit lower than global max (should be respected). - not sure if we need this test.
Test with no global max limit set (should allow any limit). - not relevant, unless agree to allow any limit. Also, overpopulating the test collections/items slows down the test.
Test with global max limit set for items/collection (should cap at that limit). - test_global_collection_max_limit_set(), test_global_item_max_limit_set()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the value be overridden by an env var? Maybe I am missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Test with no global max limit set (should allow any limit). - not relevant, unless agree to allow any limit. Also, overpopulating the test collections/items slows down the test"
@YuriZmytrakov No one envisions having an api returning unlimited items. You set the default max limit for Items to 100. If a User wants to return 110 items, they are unable to (unless I am missing something, please explain). There is no reason to place these restrictions. People who use this project expect flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test this by showing how the limit param filters down to execute_search - test what its final value is - without ingesting over 100 Items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with the team — yes, we should not restrict the number of items, collections that can be returned by default. This was fixed by removing the default values items=100, collections=300. For this reason, I’m adding proper tests to verify that when no global max limit is set (any limit is allowed) populated 20 items, collections and querying to ensure any limit is allowed.
limit = min(limit, global_limit) | ||
else: | ||
limit = global_limit | ||
global_max_limit = int(os.getenv("STAC_GLOBAL_COLLECTION_MAX_LIMIT", 300)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think global_max_limit shouldn't have a default. If it's not set then there is no max limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default max value for collections has been removed.
limit = default_limit | ||
|
||
if global_max_limit > 0: | ||
limit = min(limit, global_max_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_search passes limit to post_search, so we should only have to do this in post_search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like limit
for post_search
was originally populated within the get_search
func, removing populating global and default limits, stops this function to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please pass the limit as is from get_search - this logic only needs to be in post_search. Make the change, if it's not working, I can help fix it. This is from get_search - you may need to remove the first line?:
limit = int(request.query_params.get("limit", os.getenv("STAC_ITEM_LIMIT", 10)))
base_args = {
"collections": collections,
"ids": ids,
"bbox": bbox,
"limit": limit,
"token": token,
"query": orjson.loads(query) if query else query,
"q": q,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct, there is no need to duplicate the same logic in get_search
and post_search
. I have fixed it.
Raises: | ||
HTTPException: If there is an error with the cql2_json filter. | ||
""" | ||
global_max_limit = int(os.getenv("STAC_GLOBAL_ITEM_MAX_LIMIT", 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concerned here too about setting a default. It may work for some Users but maybe not others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, it was agreed that the default number of items should be limited to avoid slowing down the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set this parameter in your api instance, but other people who set up their own api, using this project, may choose to return more than 100 items. Who agreed to this? When you set up your api on your servers, you set this env var, and then everything is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default param of 100 has been removed, thank you!
d89c9e2
to
ab40720
Compare
Raises: | ||
HTTPException: If there is an error with the cql2_json filter. | ||
""" | ||
global_max_limit = int(os.getenv("STAC_GLOBAL_ITEM_MAX_LIMIT", 100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set this parameter in your api instance, but other people who set up their own api, using this project, may choose to return more than 100 items. Who agreed to this? When you set up your api on your servers, you set this env var, and then everything is fine.
limit = default_limit | ||
|
||
if global_max_limit > 0: | ||
limit = min(limit, global_max_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the value be overridden by an env var? Maybe I am missing something here?
limit = default_limit | ||
|
||
if global_max_limit > 0: | ||
limit = min(limit, global_max_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please pass the limit as is from get_search - this logic only needs to be in post_search. Make the change, if it's not working, I can help fix it. This is from get_search - you may need to remove the first line?:
limit = int(request.query_params.get("limit", os.getenv("STAC_ITEM_LIMIT", 10)))
base_args = {
"collections": collections,
"ids": ids,
"bbox": bbox,
"limit": limit,
"token": token,
"query": orjson.loads(query) if query else query,
"q": q,
}
README.md
Outdated
| `STAC_GLOBAL_ITEM_MAX_LIMIT` | Configures the maximum number of STAC items that can be returned in a single search request. | `100` | Optional | | ||
| `STAC_DEFAULT_ITEM_LIMIT` | Configures the default number of STAC items returned when no limit parameter is specified in the request. | `10` | Optional | | ||
| `STAC_INDEX_ASSETS` | Controls if Assets are indexed when added to Elasticsearch/Opensearch. This allows asset fields to be included in search queries. | `false` | Optional | | ||
| `ENV_MAX_LIMIT` | Configures the environment variable in SFEOS to override the default `MAX_LIMIT`, which controls the limit parameter for returned items and STAC collections. | `10,000` | Optional | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it interferes with the logic in this pr?
@YuriZmytrakov If this pr is merged there will be 5 environment variables dealing with limit. I think it's too confusing as is. The expectations will need to be documented in the Readme. You may need to allot more time to this issue.
|
I worked on a project where I was fetching hundred of items at a time from mongodb. I divided up the screen into 16 grids and made 16 asynchronous calls to the database to fetch data with a very high limit set. By using the fields extension I was limiting the amount of data sent which was then adjusted at different zoom levels. Zoomed out, I was only fetching geospatial data to display points on a map. But as Users zoomed in, more data was returned. At high zoom levels, Users could click on a point and get the full data for that point. |
484c63b
to
4aa8b7f
Compare
700a222
to
0402484
Compare
…items Ensure collections and items endpoints use their respective default and maximum limit values.
**Related Issue(s):** - None **Description:** Fixed - Issue where token, query param was not being passed to POST collections search logic. - Issue where datetime param was not being passed from POST collections search logic to Elasticsearch. - Collections search tests to ensure both GET /collections and GET/POST /collections-search endpoints are tested. **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
**Related Issue(s):** - None **Description:** - Add `Latest News` section to readme - Add `CloudFerro` logo to the Sponsors and Supporters section **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
**Related Issue(s):** - None **Description:** - Make latest news section scrollable, adjust logo size **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
0402484
to
20fa813
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. Just one more thing..
body_data = await request.json() | ||
body_limit = body_data.get("limit") | ||
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using request.query_params.get("limit")
or body_data.get("limit")
, can you just use limit
from the function params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the limit can be taken from the func params. However, since the default limit=10
, I can’t determine whether the user explicitly provided the ?limit
param or it's default limit=10. The ?limit
parameter should override the default_limit
from the env var.
Option 1: Use a boolean to determine whether the user the ?limit
or limit
param body, and if so, use that value as the limit
in the function.
Option 2: Keep the proposed approach and simply pull the limit from request.query_params.get("limit") or body_data.get("limit").
Let me know your thoughts.
baa3df8
to
7d68424
Compare
Description:
This PR implements returned object limit constraints for collections and items search endpoints which are set in environment variables. The changes ensure consistent behavior between
GET
andPOST
search methods.PR Checklist:
pre-commit run --all-files
)make test
)